Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 24, 2025

📄 48% (0.48x) speedup for UsageInfo.serialize_model in src/mistralai/models/usageinfo.py

⏱️ Runtime : 538 microseconds 363 microseconds (best of 352 runs)

📝 Explanation and details

The optimized code achieves a 48% speedup through several key performance improvements:

Primary Optimizations:

  1. Set-based lookups instead of lists: Converting optional_fields, nullable_fields, and null_default_fields from lists to sets provides O(1) membership testing instead of O(n), which is crucial since these lookups happen for every model field.

  2. Eliminated redundant set operations: The original code used self.__pydantic_fields_set__.intersection({n}) which creates a new set for each field check. The optimized version uses direct membership testing n in self_fields_set, avoiding set construction overhead.

  3. Combined dictionary operations: Using serialized.pop(k, None) instead of separate get() and pop() calls reduces dictionary lookups from 2 to 1 per field.

  4. Efficient bulk update: Replacing the manual loop through remaining serialized items with m.update(serialized) leverages Python's optimized C implementation for dictionary merging.

  5. Cached attribute access: Storing self.__pydantic_fields_set__ and type(self).model_fields in local variables eliminates repeated attribute lookups.

Performance Impact by Test Case:

  • Basic cases (20-27% faster): Benefit primarily from set lookups and reduced attribute access
  • Extra fields (25-100% faster): The bulk update optimization shines when many extra fields are present, showing up to 101% improvement with 1000 extra fields
  • Large scale mixed (77% faster): Combines benefits of all optimizations when processing many fields with large values

The optimizations are particularly effective for models with multiple fields and extra attributes, making this ideal for high-throughput serialization scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 245 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from typing import Any, Dict, Optional

import pydantic
# imports
import pytest  # used for our unit tests
from mistralai.models.usageinfo import UsageInfo
from pydantic import ConfigDict, model_serializer


# Simulate UNSET and UNSET_SENTINEL as used in the original code
class _UnsetType:
    pass
UNSET_SENTINEL = _UnsetType()
UNSET = UNSET_SENTINEL  # For compatibility

# Simulate OptionalNullable as a type alias (for test purposes)
OptionalNullable = Optional

# Simulate BaseModel from mistralai.types
class BaseModel(pydantic.BaseModel):
    pass
from mistralai.models.usageinfo import UsageInfo


# Helper for serialization handler
def default_handler(model):
    # Simulate Pydantic's model.dict() with all fields, including unset
    # For fields with UNSET, include them as UNSET_SENTINEL
    d = {}
    for name, field in model.model_fields.items():
        val = getattr(model, name, UNSET_SENTINEL)
        # For fields not set, Pydantic would use default, unless excluded
        d[field.alias or name] = val
    # Include extra fields if present
    if hasattr(model, "__pydantic_extra__"):
        d.update(getattr(model, "__pydantic_extra__"))
    return d

# -----------------------------
# Unit Tests for serialize_model
# -----------------------------

# 1. Basic Test Cases

def test_basic_all_defaults():
    """Test serialization with all default values."""
    u = UsageInfo()
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 8.82μs -> 7.16μs (23.3% faster)

def test_basic_all_fields_set():
    """Test serialization with all fields explicitly set."""
    u = UsageInfo(prompt_tokens=10, completion_tokens=20, total_tokens=30, prompt_audio_seconds=40)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 6.90μs -> 5.43μs (26.9% faster)

def test_basic_partial_fields_set():
    """Test serialization with some fields set, others default."""
    u = UsageInfo(prompt_tokens=5)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 7.51μs -> 6.21μs (20.9% faster)

def test_basic_prompt_audio_seconds_set_to_none():
    """Test serialization when prompt_audio_seconds is explicitly set to None."""
    u = UsageInfo(prompt_audio_seconds=None)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 6.52μs -> 5.40μs (20.8% faster)

# 2. Edge Test Cases


def test_edge_extra_fields():
    """Test serialization when extra fields are present."""
    u = UsageInfo(prompt_tokens=1)
    # Simulate extra fields
    u.__pydantic_extra__ = {"custom_field": "extra_value"}
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 9.96μs -> 7.94μs (25.5% faster)

def test_edge_fields_set_to_zero_and_none():
    """Test serialization when fields are set to 0 and None."""
    u = UsageInfo(prompt_tokens=0, completion_tokens=None, total_tokens=0, prompt_audio_seconds=None)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 7.23μs -> 5.81μs (24.4% faster)

def test_edge_fields_set_to_negative():
    """Test serialization with negative values (should be accepted as ints)."""
    u = UsageInfo(prompt_tokens=-1, completion_tokens=-2, total_tokens=-3, prompt_audio_seconds=-4)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 6.47μs -> 5.21μs (24.2% faster)

def test_edge_fields_unusual_types():
    """Test serialization with unusual types (should raise validation error)."""
    with pytest.raises(pydantic.ValidationError):
        UsageInfo(prompt_tokens="not_an_int")


def test_large_scale_many_extra_fields():
    """Test serialization with many extra fields (up to 1000)."""
    u = UsageInfo(prompt_tokens=1)
    # Add 1000 extra fields
    u.__pydantic_extra__ = {f"extra_{i}": i for i in range(1000)}
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 68.4μs -> 34.0μs (101% faster)
    # All extra fields should be present
    for i in range(1000):
        pass

def test_large_scale_fields_set_to_various_values():
    """Test serialization with a variety of values for all fields."""
    u = UsageInfo(prompt_tokens=999, completion_tokens=888, total_tokens=777, prompt_audio_seconds=666)
    codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 7.17μs -> 5.71μs (25.6% faster)


def test_large_scale_multiple_instances():
    """Test serialization of many UsageInfo instances to check for memory leaks or state issues."""
    results = []
    for i in range(100):
        u = UsageInfo(prompt_tokens=i, completion_tokens=i+1, total_tokens=i+2, prompt_audio_seconds=i+3)
        codeflash_output = UsageInfo.serialize_model(u, default_handler); result = codeflash_output # 237μs -> 170μs (39.1% faster)
        results.append(result)
    # Ensure all results are unique and correct
    for i, r in enumerate(results):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

from typing import Any, Dict, Optional

import pydantic
# imports
import pytest
from mistralai.models.usageinfo import UsageInfo
from mistralai.types import UNSET, UNSET_SENTINEL, BaseModel, OptionalNullable
from pydantic import ConfigDict, model_serializer


# Helper function to simulate the handler
def default_handler(instance):
    # Simulate Pydantic's model dict serialization
    # Include extra fields if present
    base = dict(instance.__dict__)
    # Remove private and Pydantic internals
    base.pop('__pydantic_extra__', None)
    base.pop('__pydantic_fields_set__', None)
    # Add extra fields
    if hasattr(instance, '__pydantic_extra__'):
        base.update(instance.__pydantic_extra__)
    return base

# ---------------------- UNIT TESTS ----------------------

# 1. Basic Test Cases

def test_serialize_model_with_defaults():
    # Test serialization with all default values
    u = UsageInfo()
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 7.12μs -> 6.09μs (16.9% faster)

def test_serialize_model_with_all_fields_set():
    # Test serialization with all fields explicitly set
    u = UsageInfo(prompt_tokens=10, completion_tokens=20, total_tokens=30, prompt_audio_seconds=5)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 5.59μs -> 4.46μs (25.4% faster)

def test_serialize_model_with_some_fields_set():
    # Test serialization with some fields set, others default
    u = UsageInfo(prompt_tokens=1)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 6.59μs -> 5.25μs (25.5% faster)

def test_serialize_model_with_extra_fields():
    # Test serialization with extra fields (should be included)
    u = UsageInfo(prompt_tokens=2)
    u.__pydantic_extra__ = {'extra_field': 'extra_value'}
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 6.89μs -> 5.43μs (26.8% faster)

# 2. Edge Test Cases

def test_serialize_model_with_none_values():
    # Test serialization with None values
    u = UsageInfo(prompt_tokens=None, completion_tokens=None, total_tokens=None, prompt_audio_seconds=None)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 5.98μs -> 4.53μs (31.9% faster)

def test_serialize_model_with_unset_prompt_audio_seconds():
    # Test serialization with prompt_audio_seconds explicitly UNSET
    u = UsageInfo(prompt_audio_seconds=UNSET)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 6.76μs -> 5.33μs (26.8% faster)


def test_serialize_model_with_negative_values():
    # Test serialization with negative values
    u = UsageInfo(prompt_tokens=-1, completion_tokens=-2, total_tokens=-3, prompt_audio_seconds=-4)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 7.37μs -> 5.56μs (32.4% faster)

def test_serialize_model_with_large_integer_values():
    # Test serialization with very large integer values
    large_num = 10**18
    u = UsageInfo(prompt_tokens=large_num, completion_tokens=large_num, total_tokens=large_num, prompt_audio_seconds=large_num)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 5.96μs -> 4.71μs (26.4% faster)

def test_serialize_model_with_missing_fields():
    # Test serialization with missing fields (simulate by deleting attribute)
    u = UsageInfo()
    del u.prompt_tokens
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 7.38μs -> 6.09μs (21.0% faster)

def test_serialize_model_with_nonstandard_types():
    # Test serialization with non-standard types in extra fields
    u = UsageInfo()
    u.__pydantic_extra__ = {'custom_obj': object()}
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 6.59μs -> 5.69μs (15.8% faster)

# 3. Large Scale Test Cases

def test_serialize_model_many_extra_fields():
    # Test serialization with many extra fields
    u = UsageInfo(prompt_tokens=1)
    # Add 999 extra fields
    extras = {f'field_{i}': i for i in range(999)}
    u.__pydantic_extra__ = extras
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 63.5μs -> 31.7μs (100% faster)
    for i in range(999):
        pass

def test_serialize_model_large_values():
    # Test serialization with large values for all fields
    big = 2**60
    u = UsageInfo(prompt_tokens=big, completion_tokens=big, total_tokens=big, prompt_audio_seconds=big)
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 5.99μs -> 4.58μs (30.8% faster)

def test_serialize_model_large_scale_mixed():
    # Test serialization with large number of extra fields and large values
    big = 10**15
    u = UsageInfo(prompt_tokens=big, completion_tokens=big, total_tokens=big, prompt_audio_seconds=big)
    extras = {f'field_{i}': big + i for i in range(500)}
    u.__pydantic_extra__ = extras
    codeflash_output = u.serialize_model(default_handler); result = codeflash_output # 35.4μs -> 19.9μs (77.7% faster)
    for i in range(500):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-UsageInfo.serialize_model-mh4g4xi4 and push.

Codeflash

The optimized code achieves a **48% speedup** through several key performance improvements:

**Primary Optimizations:**

1. **Set-based lookups instead of lists**: Converting `optional_fields`, `nullable_fields`, and `null_default_fields` from lists to sets provides O(1) membership testing instead of O(n), which is crucial since these lookups happen for every model field.

2. **Eliminated redundant set operations**: The original code used `self.__pydantic_fields_set__.intersection({n})` which creates a new set for each field check. The optimized version uses direct membership testing `n in self_fields_set`, avoiding set construction overhead.

3. **Combined dictionary operations**: Using `serialized.pop(k, None)` instead of separate `get()` and `pop()` calls reduces dictionary lookups from 2 to 1 per field.

4. **Efficient bulk update**: Replacing the manual loop through remaining serialized items with `m.update(serialized)` leverages Python's optimized C implementation for dictionary merging.

5. **Cached attribute access**: Storing `self.__pydantic_fields_set__` and `type(self).model_fields` in local variables eliminates repeated attribute lookups.

**Performance Impact by Test Case:**
- **Basic cases** (20-27% faster): Benefit primarily from set lookups and reduced attribute access
- **Extra fields** (25-100% faster): The bulk update optimization shines when many extra fields are present, showing up to 101% improvement with 1000 extra fields
- **Large scale mixed** (77% faster): Combines benefits of all optimizations when processing many fields with large values

The optimizations are particularly effective for models with multiple fields and extra attributes, making this ideal for high-throughput serialization scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 24, 2025 06:04
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant